Skip to content

feat(alibaba): support custom endpoint in OSSHook for VPC internal access#66512

Open
wenbinye wants to merge 2 commits intoapache:mainfrom
wenbinye:feat/oss-hook-vpc-endpoint
Open

feat(alibaba): support custom endpoint in OSSHook for VPC internal access#66512
wenbinye wants to merge 2 commits intoapache:mainfrom
wenbinye:feat/oss-hook-vpc-endpoint

Conversation

@wenbinye
Copy link
Copy Markdown

@wenbinye wenbinye commented May 7, 2026

Summary

OSSHook._get_client now reads extra.endpoint from the connection config, enabling VPC internal endpoints (e.g. oss-cn-hangzhou-internal.aliyuncs.com) instead of being limited to public endpoints.

Motivation

When Airflow pods run in the same VPC as Alibaba Cloud OSS, using the public endpoint incurs unnecessary public traffic. For clusters in the same region, the VPC internal endpoint (oss-{region}-internal.aliyuncs.com) provides lower latency and no egress charges.

Changes

  • oss.py: _get_client now checks for extra.endpoint before falling back to the default public endpoint
  • test_oss.py: Added test_get_client_default_endpoint and test_get_client_custom_endpoint
  • oss_mock.py: Updated mock_oss_hook_default_project_id to accept optional endpoint parameter

How to use

Set endpoint in the extra field of the OSS connection:

{
  "auth_type": "AK",
  "access_key_id": "...",
  "access_key_secret": "...",
  "region": "cn-hangzhou",
  "endpoint": "oss-cn-hangzhou-internal.aliyuncs.com"
}

Co-Authored-By: Claude Sonnet 4.6 [email protected]

…cess

OSSHook._get_client now reads extra.endpoint from the connection config,
enabling VPC internal endpoints (e.g. oss-cn-hangzhou-internal.aliyuncs.com)
instead of being limited to public endpoints.

Fixes: scheduler/task pods in the same VPC as OSS can now use the
lower-latency internal endpoint, avoiding unnecessary public traffic.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@boring-cyborg
Copy link
Copy Markdown

boring-cyborg Bot commented May 7, 2026

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our prek-hooks will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example Dag that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: [email protected]
    Slack: https://s.apache.org/airflow-slack

OSSTaskHandler._read() calls self.oss_log_exists() and self.oss_read()
but these methods are only defined in OSSRemoteLogIO base class, not in
OSSTaskHandler itself. This causes AttributeError when the UI tries to
read task logs from OSS after a task completes.

Add oss_log_exists() and oss_read() wrapper methods to OSSTaskHandler
that proxy to self.io (OSSRemoteLogIO instance).

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant